Skip to content

fix(dev): harden docker-desktop image delivery, drop broken --no-workers (#517, #518, #519)#521

Merged
ericfitz merged 1 commit into
mainfrom
fix/dev-docker-desktop-followups
Jul 4, 2026
Merged

fix(dev): harden docker-desktop image delivery, drop broken --no-workers (#517, #518, #519)#521
ericfitz merged 1 commit into
mainfrom
fix/dev-docker-desktop-followups

Conversation

@ericfitz

@ericfitz ericfitz commented Jul 4, 2026

Copy link
Copy Markdown
Owner

Three follow-ups from the Docker Desktop dev-target work (#520), all in the dev-tooling layer (scripts/lib/deploy.py + dev k8s overlays). No production/Go code, no DB schema.

#517 — pre-import postgres/redis to avoid first-run cgr.dev flake

Docker Desktop's containerd pulls cgr.dev/chainguard/{postgres,redis} independently of the host Docker daemon; that first pull occasionally fails with a transient EOF, leaving pods in ErrImagePull.

  • build_and_push now docker pulls the base images on the host and imports them into the node's containerd alongside the tmi-* images.
  • Postgres/redis manifests pinned to imagePullPolicy: IfNotPresent so the imported copy is used (a :latest tag otherwise defaults to Always and re-pulls, defeating the import). Redis pin is a per-overlay kustomize patch because redis.yml is shared with k3s (which remaps redis to redis:7-alpine); postgres is pinned directly in docker-desktop/postgres.yml (applied raw by deploy.py).
  • Base-image set is DB-aware: oracle uses external ADB and deploys no Postgres pod, so only redis is imported there.

#518 — remove the broken --no-workers path

It applied raw leaf manifests (image: localhost:5000/tmi-*:dev) that only worked against the retired kind local registry — ErrImagePull on docker-desktop/k3s. No make target passed it (developer-manual-only dead code). Removed the flag, the no_workers params on start/restart/apply_overlay, and _no_workers_files.

#519 — harden import_image_to_node against a pipe hang

If the importer Popen raised before saver.stdout.close(), the parent kept the pipe's read end open and saver.wait() could deadlock once the buffer filled. The importer Popen is now wrapped so saver's stdout is released (and saver killed) on any exception before the wait.

Verification

  • make test-dev-scripts — 94 pass (added tests for DB-aware base-image selection and the import teardown path; removed the --no-workers tests).
  • make lint — pass.
  • kubectl kustomize renders IfNotPresent on redis across all three docker-desktop overlays; k3s unaffected.
  • ⚠️ The runtime behavior of dev(docker-desktop): pre-import postgres/redis images to avoid first-run cgr.dev pull flake #517 (actually dodging the cgr.dev flake) can only be confirmed with a live make dev-up CLUSTER=docker-desktop on a fresh node.

Closes #517
Closes #518
Closes #519

🤖 Generated with Claude Code

…orkers (#517, #518, #519)

Three follow-ups from the Docker Desktop dev-target work (#520), all in the
dev-tooling layer (scripts/lib/deploy.py + dev k8s overlays).

#517 — pre-import postgres/redis base images to avoid the first-run cgr.dev
pull flake. Docker Desktop's containerd pulls cgr.dev/chainguard/{postgres,redis}
independently of the host Docker daemon, and that first pull occasionally fails
with a transient EOF, leaving the pods in ErrImagePull. build_and_push now
`docker pull`s the base images on the host and imports them into the node's
containerd alongside the tmi-* images, and the postgres/redis manifests are
pinned to imagePullPolicy: IfNotPresent so the imported copy is used (a :latest
tag otherwise defaults to Always and re-pulls, defeating the import). The
redis pin is a per-overlay kustomize patch (redis.yml is shared with k3s, which
remaps redis to redis:7-alpine); postgres is pinned directly in the
docker-desktop postgres.yml (applied raw by deploy.py). The base-image set is
db-aware: oracle uses an external ADB and deploys no Postgres pod, so only redis
is imported there.

#518 — remove the --no-workers bring-up path. It applied the raw leaf manifests
(image: localhost:5000/tmi-*:dev), which only worked against the retired kind
local registry and yields ErrImagePull on docker-desktop/k3s. No make target
passes it, so it was developer-manual-only dead/broken code. Dropped the flag
from devenv.py, the no_workers params from start/restart/apply_overlay, and the
_no_workers_files helper.

#519 — harden import_image_to_node against a Popen-raises-before-close hang. If
the importer Popen raised before saver.stdout.close() ran, the parent kept the
pipe's read end open and saver.wait() in the finally could deadlock once the
pipe buffer filled. The importer Popen is now wrapped so saver's stdout is
released (and saver killed) on any exception before the wait.

Unit tests added for the db-aware base-image selection and the import teardown
path; the --no-workers tests were removed. make test-dev-scripts (94) and
make lint pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX
@ericfitz ericfitz merged commit a9938a8 into main Jul 4, 2026
12 checks passed
@ericfitz ericfitz deleted the fix/dev-docker-desktop-followups branch July 4, 2026 17:44
ericfitz added a commit that referenced this pull request Jul 4, 2026
The dev environment was migrated to Docker Desktop Kubernetes (#520/#521),
but CLAUDE.md still described a kind cluster with the database and Redis
running as containers "external to the cluster." That stale description
caused a misdiagnosis when the in-cluster PostgreSQL PVC came up empty:
the real data was assumed lost when it was actually stranded in the old
host Docker volume the new topology no longer mounts.

Update both occurrences to reflect reality:
- Default CLUSTER=docker-desktop (k3s also supported)
- server, PostgreSQL, Redis, and NATS all run in-cluster in the
  tmi-platform namespace (Deployments + StatefulSets)
- PostgreSQL data persists in a Kubernetes PVC (data-postgres-0), NOT a
  host Docker volume; re-provisioning the PVC starts from an empty DB
- With DB=oracle the database is an external managed Oracle ADB
- orchestration is via scripts/devenv.py; manifests under
  deployments/k8s/dev/<cluster>/


Claude-Session: https://claude.ai/code/session_01Kk9GxWS9EpazjbwBKfMpUX

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant